Introduction

Through an FOI request, I have obtained the download statistics (COUNTER 5 format) for the calendar years 2019 and 2020 for ScienceDirect Usage at the University of Oxford. These data exclude any Gold OA articles.

The raw data are available in the data folder. Below is some simple analysis of the data; please let me know if you see any errors in the analysis, particularly in the handling of the COUNTER R5 data. I have chosen to analyse the ‘Total_Item_Requests’, rather than the ‘Unique_Item_Requests’.

For each of the two years I present a searchable list of downloads/year that is ranked in decreasing order of downloads. These data are then converted into a cumulative histogram of access.

I have made no attempt currently to break down the journals into different groupings (e.g. Freedom collection). This file is available as a Rmarkdown file.

2019 data

WARNING: Please ignore 2019 data; it is incorrect.

Total number of downloads: 1999121 from 2935 journals.

Note: these tables are searchable; just type in e.g. a journal name in the search box in the top right and it will narrow down the selection; likewise, you can sort by other columns by clicking on an arrow next to the column name.

The columns in the table are:

  • Title: name of the journal
  • requests: yearly total number of requests from that title
  • n: number of distinct rows in the CSV for this title.
  • rank: rank of the requests (highest first).
  • pct: requests for this journal as a fraction of all journals.
  • c_pct: cumulative version of the percentage, following the ranking.

2020 data

Total number of downloads: 1999121 from 2935 journals.

Cumulative plots

These plots for 2019 and 2020 indicate that the data seem to follow the Pareto principle: 20% of journals in the collection account for about 80% of downloads. You can use your mouse to hover over the curves to see individual journal titles and their data.

Discussion

It is unsurprising to see that the journals that are most accessed are those that come from Cell Press and Lancet titles.

Statistics like this are interesting to view, but should be viewed with caution (e.g. Wood-Doughty et al 2019). I believe they are useful to help inform the value of a big deal versus unbundling (Thornton and Brundy 2021). However, over-reliance on simple metrics like this to select which journals to keep and which to remove may disproportionality affect smaller disciplines. As soon as these metrics drive decision making may lead to ‘gaming’, whereby researchers routinely download papers from their favourite journals simply to prevent them being cancelled. Further, I’d hope that such statistics may not even be relevant in a few years, if we can transition to more equitable models of publishing.

Questions / future work

  • Why are there so many journals (about 500) with only one download in the year? Why are there none with zero downloads?

  • Is it worth collecting this COUNTER R5 data from other UK institutions? Should it be freely available as a matter of routine?

  • Can this be combined with costs of titles to evaluate the cost of subsets of journals?

See the project home page for all source material and e.g. any github issues.

References

Thornton JB, Brundy C (2021) Elsevier title level pricing: Dissecting the bowl of spaghetti. J Libr Sch Commun 9:2410 Available at: http://dx.doi.org/10.7710/2162-3309.2410.

Wood-Doughty A, Bergstrom T, Steigerwald DG (2019) Do Download Reports Reliably Measure Journal Usage? Trusting the Fox to Count Your Hens? Coll Res Libr 80:694 Available at: https://crl.acrl.org/index.php/crl/article/view/17824/19653 [Accessed October 16, 2021].